home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Workbench Add-On
/
Workbench Add-On - Volume 1.iso
/
BBS-Archive
/
Dev
/
Obrn-A_1.6_src.lha
/
oberon-a
/
texts2.lha
/
texts
/
SymbolFiles.Text
< prev
next >
Wrap
Text File
|
1994-02-09
|
55KB
|
1,367 lines
Departement Informatik
Institut für Computersysteme
Eidgenössische Technische Hochschule
Zürich
Robert Griesemer
On the Linearization of Graphs and Writing Symbol Files
Cuno Pfister (ed.)
Oberon Technical Notes
Beat Heeb
Josef Templ
March 1991
Address of the authors:
Computersysteme
ETH-Zentrum
CH-8092 Zurich, Switzerland
e-mail:
griesemer@inf.ethz.ch
pfister@inf.ethz.ch
Copyright (c) 1991 Departement Informatik, ETH Zürich
On the Linearization of Graphs and Writing Symbol Files
Robert Griesemer
Abstract
The linearization of general graphs represented by pointer and record
data structures is a problem often arising in computer programs.
Whenever a graph has to be stored on or transmitted over a sequentially
organized carrier, a form of linearization is used. A simple algorithm
for this purpose is presented and a special application - the writing of
symbol files as required by modern language compilers - is described in
more detail.
Keywords: Linearization, Graphs, Symbol Files.
1. Linearization of Graphs
General graphs occur in various forms in computer science, sophisticated
data structures may often be interpreted as graphs. Whenever such a data
structure has to be stored on a file or has to be transmitted over a
network, the linearization problem occurs. For special forms of
non-linear data structures (e.g. trees) well-known solutions exist and
are comprehensively described in the basic literature. Nevertheless, for
more general data structures the wheel has to be reinvented and
literature is difficult to find [ReMö86]. The problem has a twofold
nature: firstly, the graph has to be linearized by means of a write
algorithm , and secondly, it has to be rebuilt out of its linear
description by a read algorithm . In the following, linearization means
writing a graph to a file.
1.1 Preconditions
First of all, we remember that every general (undirected) graph may be
represented by a directed graph, by means of having two directed edges
instead of an undirected one. Hence, we concentrate on directed graphs
only. Secondly, we consider only rooted and connected graphs; i.e.
graphs with a special root node and a path from the root to every other
node. Unconnected graphs or graphs with several root nodes are easily
extended such, that they obey the above preconditions. Thirdly, as an
(academic) restriction only finite graphs are considered.
In computer programs, graphs may be represented in various forms,
depending on the available notational support (the programming language)
and the way the data structure is used (the program). Here we consider
only graphs described in terms of pointers and records; i.e. every graph
node is represented by a record and every edge is represented by a
pointer. Note that possible information attached to the edges of a graph
( labeled edges ) may be interpreted as additional node data.
The nodes of such a graph may not all have the same type; i.e. the
amount of data in every node may be different and the number of directed
edges to other nodes may vary. It depends on the application and
available language how different nodes are specified. For the sake of
simplicity we identify each node with a positive tag -number (> 0), so
that every node type is clearly determined by its tag and vice versa.
Then, every node contains a certain amount M(tag) of data and (at most)
a certain number N(tag) of directed edges to other nodes. Hence, a
general graph node and its edges may be described in the following way
(written in Oberon [Wi88]):
TYPE
Node = POINTER TO NodeDesc;
NodeDesc = RECORD
tag: INTEGER; (* determines node type; tag > 0 *)
data: ARRAY M(tag) OF INTEGER; (* M depends on the node type (tag) *)
link: ARRAY N(tag) OF Node (* N depends on the node type (tag) *)
END;
Note that any data structure represented in any way in computer memory
may be written to a file by simply writing out sequentially the
corresponding memory area. Reading requires "only" the data to be read
into the same memory area (otherwise pointers would be incorrect). But
normally this is an inappropriate solution for several reasons: often
the data structure is distributed over the entire available memory and
writing would lead to an immense waste of space. For reading, the
destination memory addresses must be specified which is almost never
possible and a file created this way is inherently not portable (to
another computer system).
Hence, writing of graphs requires pointers to be transformed into
another form of reference and vice versa for reading. In addition,
reading and writing should be as efficient as possible considering both
memory and time; e.g. each node description should appear once and only
once on the file.
1.2 The Write Algorithm
The write algorithm resembles closely a naive recursive mark-algorithm
for garbage collectors, actually it is nearly the same task to be done:
all nodes have to be traversed and marked; marking is necessary to avoid
a second traversal and to refer to the node's first occurence. While
traversing the graph its structure is written to a file. Hence, the node
data structure has to be extended by a mark field, which initially must
be zero:
TYPE
Node = POINTER TO NodeDesc;
NodeDesc = RECORD
mark: INTEGER; (* used for linearization; initially = 0 *)
tag: INTEGER; (* determines node type; tag > 0 *)
data: ARRAY M(tag) OF INTEGER; (* N depends on the node type (tag) *)
link: ARRAY N(tag) OF Node (* M depends on the node type (tag) *)
END;
We say a node q is marked iff q.mark > 0. The WriteNode procedure (see
below) traverses the graph in pre-order, starting with the root node.
Whenever an unmarked node is encountered, the node is numbered and thus
marked (line 4), the node tag and data are written (lines 4 and 5) and
its successor nodes are traversed (line 6). Note that a node must be
numbered before its subtrees are traversed because they may refer to it.
Numbering is done using the counter NofNodes which is one initially and
is then incremented by one with every processed node. Hence, NofNodes is
always greater than zero. Whenever an already marked node is traversed,
only its negative node number is written out as reference number (line
2). If the node doesn't exist (i.e. q = NIL) zero is written out (line
1):
VAR
NofNodes: INTEGER; (* node number; initially = 1 *)
PROCEDURE WriteNode(q: Node);
VAR i: INTEGER;
BEGIN
(1) IF q = NIL THEN WriteInt(0)
(2) ELSIF q.mark > 0 THEN (* already marked *) WriteInt(-q.mark)
(3) ELSE (* first occurence *)
(4) WriteInt(q.tag); q.mark := NofNodes; INC(NofNodes);
(5) i := 0; WHILE i < M(tag) DO WriteInt(q.data[i]); INC(i) END;
(6) i := 0; WHILE i < N(tag) DO WriteNode(q.link[i]); INC(i) END
(7) END
END WriteNode;
During execution of WriteNode with argument root all nodes are numbered
in the order of their first occurence. References to already written
nodes are the negative node numbers. While reading the file this
information will be used to reconstruct the graph (see 1.3 The Read
Algorithm). Before WriteNode can be called a second time for the same
graph or parts of it, the preconditions have to be established again,
i.e. the nodes must be unmarked. When using time-stamps, no such unmark
pass is necessary (due to an idea of J. Templ, see [PfiHeTe91]: A
Symmetric Solution to the Load/Store Problem ). The WriteNode procedure
leads to the following file structure (in EBNF, terminal symbols are
described in double quotes):
Graph = NodeRef | NoNode | NodeDesc.
NodeRef = "negative node number (< 0)".
NoNode = "0".
NodeDesc = "node tag (> 0)" "node data" {Graph}.
It is obvious that the algorithm terminates for all graphs: there is
only a finite number of nodes and every procedure call works on at least
one node. The recursion is stopped when an already marked node and hence
a loop is found (line 2) and the algorithm returns to its predecessor
node. In line 4 the assignment q.mark := NofNodes is executed only if
q.mark = 0, i.e. each node is numbered at most once (because NofNodes >
0). On the other hand, since after numbering of the node q^ (line 4)
WriteNode numbers all subgraphs of q^ (line 6), each node is numbered at
least once. In combination we may conclude that each node is marked
exactly once. Finally, because each node's data is written iff the node
was numbered immediatly before (line 5), it becomes clear that each node
is written only once.
Note that WriteNode is a one-pass algorithm and that no additional data
structures except NofArgs and the procedure activation stack are
required. The run time complexity is O(n) where n is the number of edges
in the graph. As we mentioned above, WriteNode resembles closely the
mark phase of a mark-and-sweap garbage collector, therefore it is
possible to transform the algorithm into a completely iterative form
(e.g. if stack space is critical). The analoguous transformations for a
simple garbage collector are described in [PfiHeTe91].
1.3 The Read Algorithm
The read algorithm reconstructs a (sub-) tree from its description on a
file. Each node block in the file starts with an identification tag.
According to the EBNF file structure there are three cases to
distinguish: the node tag may be zero, negative or positive (excluding
zero).
VAR
NofNodes: INTEGER; (* node number; initially = 1 *)
NodeTab: ARRAY MaxNodes OF Node; (* node table; NodeTab[0] = NIL *)
PROCEDURE ReadNode(VAR q: Node);
VAR tag, i: INTEGER;
BEGIN
(1) ReadInt(tag);
(2) IF tag <= 0 THEN (* node reference *) q := NodeTab[-tag]
(3) ELSE (* first occurence *)
(4) NEW(q, tag); NodeTab[NofNodes] := q; INC(NofNodes);
(5) i := 0; WHILE i < M(tag) DO ReadInt(q.data[i]); INC(i) END;
(6) i := 0; WHILE i < N(tag) DO ReadNode(q.link[i]); INC(i) END
(7) END
END ReadNode;
The first and second case are handled together by initializing the
(otherwise unused) NodeTab entry zero to NIL. Because an empty subtree
is identified by a zero tag, q then automatically becomes NIL (line 2).
A node that occurs for the first time is identified by a positive number
which is also its node tag. Then, such a node is created using NEW and
stored in a global node table NodeTab (line 4). Note again that this
assignment must be done before any subtrees are read (they may
potentially refer to it). After creation of the new node its data and
its subtrees are read (line 5 and 6). A negative node tag refers to an
already read node, with the node number -tag. This node is obtained from
the NodeTab array (line 2).
It should be clear that this algorithm rebuilds the original graph, as
each line in ReadNode has its counterpart in WriteNode and each call of
WriteNode during writing implies a corresponding call of ReadNode during
reading. Clearly this algorithm is also O(n) where n is the number of
edges in the graph. A temporary node array of size n is used for
reading, this might be a drawback in case of very large graphs. A 2-pass
algorithm which needs temporary storage only for nodes which are
referenced more than once is described in the appendix.
2. Writing Symbol Files
Today's modular programming languages like Modula-2, Ada, Oberon and
others allow an application to be programmed in several more or less
independent parts, so-called modules (or packages). Such a module
typically defines data structures and provides operations (procedures)
on them, these exported objects may be used ( imported ) by other
modules without knowledge of their implementation.
Therefore, the information about a module's interface has to be
available to the compiler in an efficient way: in Modula-2 and Oberon
the necessary data is recorded on so-called symbol files. For
historical reasons and to avoid confusion we use the term symbol file
for any linear data structure which describes a module's interface.
Actually, it is not necessary that symbol files are represented by a
separate file (see 2.4 Implementation Aspects).
During compilation, the information about a module's interface is held
in the symbol table of the compiler. The symbol table can be regarded as
a graph, hence writing a symbol file requires linearizing the necessary
partial graph of this table. The methods described in the following have
been implemented in a compiler for an experimental Oberon-like language
[Gri90]. The code fragments detail only the principal structure, a
complete implementation may be found in the appendix.
2.1 Structure of Symbol Tables
The symbol table of a compiler for an Oberon-like language describes the
compiled objects, which are constants, types, variables, procedures, or
modules. The table itself is built when processing declarations. These
may be nested, so the compiler has to manage a stack of scopes, i.e.
visibility ranges of objects. For our purposes, it is sufficient to
store the objects of a scope in a linear list. More sophisticated
implementations would use structures like binary trees, because objects
have to be searched efficiently during compilation. In Modula-2 and
Oberon only global objects may be exported, i.e. only objects within the
global scope of the symbol table have to be considered for export (Fig.
1). Hence, writing the symbol file means linearizing the partial graph
described by the exported (and therefore in some way marked) objects of
this scope.
Fig. 1
Global Scope
Each node represents an object of the compiled program, containing all
the necessary information about it, such as its name, the object kind,
possibly an address and its type. Object types itself may have a very
complex structure and are further described using a Struct data
structure, which again may refer to other Struct nodes [Wi85, Cre90].
Hence objects and types are described with two different record
structures:
CONST
(* object modes *)
Undef* = 0; Scope = 1; Const* = 2; Type* = 3; Var* = 4; VarPar* = 5;
XVar* = 6; IVar* = 7; Field* = 8; LProc* = 9; XProc* = 10; IProc* = 11;
SProc* = 12; Mod* = 13;
(* type forms *)
Char* = 2; Bool* = 3; SInt* = 4; Int* = 5; LInt* = 6; Set* = 7;
Real* = 8; LReal* = 9; Cmplx* = 10; LCmplx* = 11; NilTyp* = 13;
NoTyp* = 14; Proc* = 15; String* = 16; Array* = 17; DynArr* = 18;
Record* = 19; Pointer* = 20;
TYPE
Name* = ARRAY 32 OF CHAR;
Object* = POINTER TO ObjectDesc;
Struct* = POINTER TO StructDesc;
ObjectDesc* = RECORD
link*, next*: Object; (* objects are chained using next *)
name*: Name;
typ*: Struct;
marked*: BOOLEAN; (* marked objects are exported *)
mode*: INTEGER; (* identifies object kind *)
mnolev*: INTEGER; (* module numbers are <= 0 *)
...
(* object specific data *)
END;
StructDesc* = RECORD
ref: INTEGER; (* used as mark field *)
form*: INTEGER; (* identifies structure kind *)
obj*: Object; (* points to type object if it exists *)
len*, size*: LONGINT;
base*: Struct; (* result-, element-, base- or pointee type *)
link*: Object (* record scope *)
END;
Struct nodes describe the type of an object (array, record, pointer,
procedure); a few types are predefined (e.g. Char, Integer, Real).
Because types may be recursively defined, the resulting data structure
may contain cycles. Many objects may have the same type and therefore
refer to the same struct node.
2.2 Export
The graph linearization algorithm in the form described in section 1
actually works for different node kinds (determined by the tag field)
but requires that all nodes are described using the same record.
However, enforcing this precondition would have a far too strong impact
on the structure of a compiler. As we have actually two different
record types for object and type description, the adequate solution is
to have two sets of read and write procedures, one set for objects and
one for types. This distinction is even more justified by the fact that
objects are stored in a simple linear list and are referenced only once
(with the exception of type and module objects) while type descriptors
may be referenced several times and potentially belong to cycles.
References from struct nodes to their type objects are handled directly
in the WriteStruct procedure. Modules are never marked for export and
written using a special procedure. As a consequence, the write
procedure which traverses the object scope does not have to take care of
objects already traversed and therefore no marking is necessary. Hence,
the write procedure degenerates to a simple list traversal while for the
corresponding read procedure no temporary array is necessary. Using the
above definitions, the WriteObjects procedure can be written as follows
(primitive operations like WriteInt and WriteName may be found in the
appendix section):
PROCEDURE WriteObjects(obj: Object);
BEGIN
WHILE obj # NIL DO
IF obj.marked THEN
WriteInt(obj.mode);
IF (obj.mode # Type) OR (obj.typ.obj # obj) THEN
(* no-type or alias type *)
WriteName(obj.name)
ELSE (* other type *) Write(0X)
END;
WriteStruct(obj.typ);
IF obj.mode = Const THEN (* write const value *)
ELSIF obj.mode = LProc THEN (* write parameter list *)
...
END
END;
obj := obj.next
END;
WriteInt(Undef) (* termination tag *)
END WriteObjects;
For exported objects, the mode which describes the object kind, the
object's name and its type are written. As an optimization, the name of
types is only written if it concerns alias types. Otherwise, the name
will be written by the WriteStruct procedure (given below). Then,
depending on the object mode, additional information is written (e.g.
the value of constants or the parameter list of procedures).
The WriteStruct procedure closely resembles the Write algorithm
described in 1.2 but is slightly complicated by the fact that named
types (i.e. types for which a type object exists) have to be handled
correctly. Remember, that it is not correct to simply call the
WriteObjects procedure, because WriteObjects is not built to handle more
than one reference to an object. In addition, named types (and only
these!) may be exported and imported over many modules and it must
always be guaranteed that a type, whether it was imported via several
modules ( indirect import ) or not, is described by one and only one
struct node. A unique identification is necessary, which is the type
name combined with the description of the module where the type was
defined first. An analogous situation occurs when a type gets an alias
name: then, the struct node always points to the first type object which
defined the type.
VAR
nofStructs: INTEGER; (* structure number; initially = 1 *)
PROCEDURE WriteStruct(typ: Struct);
VAR name: Name;
BEGIN
IF typ = NIL THEN WriteInt(0)
ELSIF typ.ref > 0 THEN (* already marked *) WriteInt(-typ.ref)
ELSE (* first occurence *)
WriteInt(typ.form); typ.ref := nofStructs; INC(nofStructs);
IF typ.obj # NIL THEN (* named type *)
name := typ.obj.name;
IF ~typ.obj.marked THEN (* invisible type *)
name[0] := CHR(ORD(name[0]) - ORD("@"))
END;
WriteName(name); WriteMod(GMod[-typ.obj.mnolev])
ELSE Write(0X)
END;
CASE typ.form OF
| Proc: (* write parameter list and result type *)
| Array: (* write element type and length *)
...
END
END
END WriteStruct;
Note that for exported named types there is a distinction between those
with and those without corresponding exported type objects, where the
latter are called invisible types. Invisible types must not be visible
in an importing module, i.e. a programmer is not allowed to use them by
name within a declaration. On the other hand they have to be visible to
the compiler because it must be ensured that all invisible types with
the same name are mapped onto a common struct node. Hence, the same
information as for named types is written, but a simple trick inhibits
any use of the type name within a program: the first letter of its name
(which is always greater or equal than "A") is modified such that the
name becomes a syntactically invalid identifier.
For named types the declaring module has to be known. Because several
types may be exported by the same module, the corresponding module
object may be referenced more than once. A special procedure which
handles modules correctly is used:
VAR
nofLMods: INTEGER; (* local module number; initially = 0 *)
PROCEDURE WriteMod(mod: Object);
BEGIN
IF mod.ref < 0 THEN (* first occurence *)
mod.ref := nofLMods; INC(nofLMods);
WriteInt(Mod); WriteKey(mod.key); WriteName(mod.name)
ELSE WriteInt(-mod.ref)
END
END WriteMod;
Like struct nodes, module descriptors are only written at their very
first occurence. Whenever the same module is referenced later, only its
reference number is written. Module pointers are never NIL, so numbering
starts with zero and a module is marked iff mod.ref >= 0. During import
the compiler has to check that only one version of a module is used
globally, therefore every module needs a unique key. Remember that
modules are never written out accidentally by the WriteObjects
procedure, because they are never marked for export.
Writing the complete symbol file requires writing out the module
descriptor of the compiled module (by means of WriteMod) followed by
writing out all its exported objects (by means of WriteObjects).
2.3 Import
Like the general ReadNode procedure is mirroring the structure of the
WriteNode procedure, the import procedures are similar to the export
procedures. This fact allows for easy extension, because additionally
written data in an export procedure automatically leads to the
corresponding read operations in the import procedure. Nevertheless, a
few difficulties need to be mastered anyway.
In several situations an object may be imported more than once: the
trivial case occurs when the same module is imported twice because of a
substitution (or alias) name. One might expect that this special case
should be handled by simply ignoring a second import; but actually a
multiple import of objects has to be handled anyway, hence double import
of entire modules is only a special case of a more general situation.
Multiple import of an object normally occurs for type objects, when
beeing imported indirectly across different modules. Consider the
following situation: a module A exports a type (object) T which is used
in a variable V of a module B. Module C which imports A and B
consequentially also imports the type T from A and the variable V from B
and hence once again the type T as type of V (Fig. 2).
Fig. 2
Double import
However, multiply loading objects or structures, if not detected, could
lead to incorrect incompatibilities during type checking. Therefore,
each object which is read from the symbol file is discarded if it was
already present in the symbol table. Hence, each object is represented
by its very first loaded instance which is called primary instance
[Gu85]. In consequence, the ReadObjects procedure reads an object from
the symbol file but the procedure InsertImport inserts it in the
corresponding module scope only if the object (with the same name) has
not already been imported. As an optimization, note that only alias
types have to be (additionally) inserted with InsertImport because their
names differ from their type object names. All other types are handled
in the ReadStruct procedure.
PROCEDURE InsertImport(VAR obj: Object; scope: Object);
VAR p, q: Object;
BEGIN
p := scope; q := scope.next;
WHILE q # NIL DO
IF q.name = obj.name THEN obj := q; RETURN END;
p := q; q := q.next
END;
obj.mnolev := scope.mnolev; p.next := obj
END InsertImport;
PROCEDURE ReadObjects(scope: Object);
VAR mode: LONGINT; obj: Object; typ: Struct; name: Name;
BEGIN
LOOP (* read all objects *)
ReadInt(mode);
IF mode = Undef THEN (* no more objects *) EXIT
ELSIF mode = Type THEN
ReadName(name);
IF name # "" THEN (* alias type *)
NewObject(obj, name, Type); ReadStruct(obj.typ);
InsertImport(obj, scope)
ELSE (* other types *) ReadStruct(typ)
END
ELSE
ReadName(name); NewObject(obj, name, SHORT(mode));
ReadStruct(obj.typ);
IF mode = Const THEN (* read const value *)
ELSIF obj.mode = LProc THEN (* read parameter list *)
...
END;
InsertImport(obj, scope)
END
END
END ReadObjects;
The procedure ReadStruct decides upon reading the first number form
whether the structure was already read from the (same) symbol file or
not. In the first case, the structure already built is taken out of a
local structure table LStruct which serves as a translation table for
structure references analogous to NodeTab in the ReadNode procedure. In
the second case, the structure is created and inserted in the LStruct
table, then the specific structure information is read. As described in
ReadObjects, named types must also be ignored if occuring a second time:
the structure information is read but then discarded (by means of
InsertImport) and the primary instance is used (and also inserted in the
LStruct table).
As a tricky point, notice further: Because each type (named or not) may
refer to itself, it is absolutely necessary that the primary instance is
found before any other types are read (which potentially refer to the
same type and therefore would obtain the wrong structure out of the
LStruct table). Hence, no other types must occur between the type tag of
a named type and its identification, i.e. its name and its original
module description. This rule corresponds to postulate 5 in [Gu85].
VAR
nofStructs: INTEGER;
(* structure number; initially = 1 *)
LStruct: ARRAY MaxNofStructs OF Struct;
(* local structure table; LStruct[0] = NIL *)
PROCEDURE ReadStruct(VAR typ: Struct);
VAR form: LONGINT; htyp: Struct; name: Name; obj, mod: Object;
BEGIN
ReadInt(form);
IF form <= 0 THEN (* struct reference or NIL *) typ := LStruct[-form]
ELSE (* first occurence *)
NewStruct(htyp, SHORT(form)); ReadName(name);
IF name # "" THEN (* named type *)
NewObject(obj, name, Type); obj.marked := TRUE; obj.typ := htyp;
htyp.obj := obj; ReadMod(mod); InsertImport(obj, mod.link);
typ := obj.typ
ELSE typ := htyp
END;
LStruct[nofStructs] := typ; INC(nofStructs);
CASE form OF
| Proc: (* read parameter list and result type *)
| Array: (* read element type and length *)
...
END
END
END ReadStruct;
At the end, the ReadMod procedure is presented. As expected, a local
module table LMod is used as translation table for module references. In
addition, because modules must always be represented by their primary
instances, a global module table GMod is necessary, which is accessed in
the InsertMod procedure.
VAR
nofGMods*: INTEGER; (* global module number; initially = 0 *)
GMod*: ARRAY MaxNofGMods OF Object; (* global module table *)
PROCEDURE InsertMod*
(VAR mod: Object; VAR name: ARRAY OF CHAR; key: LONGINT);
VAR i: INTEGER;
BEGIN
i := 0;
WHILE (i < nofGMods) & (name # GMod[i].name) DO INC(i) END;
IF i < nofGMods THEN (* module already imported *)
mod := GMod[i];
IF mod.key # key THEN err(150) (* key inconsistency *) END
ELSE
NewObject(mod, name, Mod); (* must not be visible in global scope *)
mod.key := key; mod.mnolev := -nofGMods;
OpenScope(mod.link, mod.mnolev); CloseScope; (* allocate own scope *)
GMod[nofGMods] := mod; INC(nofGMods)
END
END InsertMod;
VAR
nofLMods: INTEGER; (* local module number; initially = 0 *)
LMod: ARRAY MaxNofLMods OF Object; (* local module table *)
PROCEDURE ReadMod(VAR mod: Object);
VAR ref, key: LONGINT; name: Name;
BEGIN
ReadInt(ref);
IF ref > 0 THEN (* first occurence *)
ReadKey(key); ReadName(name);
InsertMod(mod, name, key);
LMod[nofLMods] := mod; INC(nofLMods)
ELSE mod := LMod[-ref]
END
END ReadMod;
Importing a whole module requires reading the module (by means of
ReadMod) followed by reading all exported objects of this module (by
means of ReadObjects). Together there are only a few additional rules to
the general graph algorithm to be observed:
1. For types and modules , which both may be referenced several times
within the same symbol file , a marking scheme and a translation table
analogous to the general graph read/write algorithms is used.
2. For named types and modules , which both may occur in several symbol
files , the primary instance always must be used and inserted into the
local translation tables. If the primary instance already exists, the
remaining data must be read but then discarded. Repeated occurence of
the same module is detected using a global module table, repeated
occurence of the same named type is detected using its module and the
corresponding module scope.
3. As a consequence of the second rule, no other types must occur in the
symbol file before name and module of a named type are specified.
2.4 Implementation Aspects
Symbol File Representation
As mentioned in the beginning, symbol files need not to be represented
by a separate file. Actually, a symbol file independent of the
corresponding object file mirrors the fact that in Modula-2 and other
languages the definition and the implementation of a module were
compiled separately: the definition module was compiled into a symbol
file and the implementation module into an object file. If exported
objects are marked in some way within the implementation module (as in
Oberon), a definition module and hence a separate symbol file is not
necessary. It is more adequate to produce only a single file during
compilation, which contains all the necessary information about the
module including its interface description. For example, the
(conventional) symbol file may be appended to the object file. The
advantages are obvious: only a single file has to be distributed, this
file is always self-contained and therefore consistent and it is
possible to directly generate the interface specification out of an
object file with a Browser tool. Should it be necessary to inhibit
public use of a module (e.g. because it supports low-level features), it
is easy to remove the symbol file from the object file and distribute
the interface-less object file only.
Canonical Form for Symbol Files
When a module has to be recompiled for some reason without a change in
its interface, the old module key should be used again in the symbol
file. Otherwise all depending modules would also be invalidated and
would have to be recompiled. A simple method to check whether a module's
interface has changed or not is to bytewise compare the new symbol file
against the old one. If it has not changed, the old key can be reused.
This comparison method requires very stable symbol files: e.g. the
ordering of the exported objects should have no effect on the symbol
file. This is clearly not fulfilled in the implementation described
above (which we've chosen for simplicity), because objects are ordered
in the linear list according to their occurence in the module. Desired
is a kind of canonical form for symbol files.
As we mentioned earlier, access to a special object in the symbol table
should be as efficient as possible, so one could implement the table
using binary trees instead of (unsorted) linear lists. Then, the objects
could be written in alphabetical order to the symbol file using an
in-order traversal of the tree. Symbol files in this canonical form are
invariant to any changes in the ordering of a module's objects.
Unfortunately this implies an undesired side effect which destroys the
efficiency of binary trees: Importing a symbol file means inserting the
imported objects in binary trees. Because the objects are alphabetically
sorted, the symbol table tree degenerates to a linear list. Further,
most of the objects in typical modules are known by import. In
consequence, searching in the symbol table means actually searching in
linear lists. Hence, the additional implementation effort for simple
binary trees may be no longer justified.
The situation may be improved by modifing the proposed canonical form.
Instead of writing all objects in alphabetical order, groups of objects
of the same kind (with the same mode) are written alphabetically: first
all constants, then the variables, then types, and so on. Within each
group the objects are ordered and the group ordering is also specified.
This is another canonical form which is also invariant against any
permutations of exported objects. When imported, the trees will not
completely degenerate but are decomposed into several partial lists. The
result is at least a better balanced tree. Note that the addional effort
of traversing the tree in several passes (for each object group) is only
necessary during export; the more time critical import procedure is not
at all affected.
Predefined Types
In several languages there exist predefined types which are known to the
compiler. Such types are always exported using a fixed reference number.
Before any object is imported, they are inserted into the local
structure table by the import procedure (see Appendix B).
Increased Import Speed
As measurements show (see below), when reading is bytewise, the import
speed highly depends on the symbol file length. Besides strings, most of
the data on a symbol file are integers. Hence, a principal goal should
be to reduce the space used to write a single integer number. Integers
occur frequently as tag or reference numbers and most of them are very
small (with an absolute value less than 64). Nevertheless also bigger
integers should be managed easily. The ReadInt and WriteInt procedures
described in [Te90] allow integers to be written in a
machine-independent format which uses only one byte per number in most
cases.
2.5 Measurements
The following measurements show lengths and reading times of symbol
files. The basic modules of the Oberon System are choosen as a typical
"module mix". The method described here (used in module CCT) is compared
to the method used in the existing portable Oberon compilers (module
OPT, see [Cre90]) which is essentially the same method as described in
[Gu85]. To be fair, the CCT method was adjusted such that about the same
information per object was written out as with OPT (e.g. the address of
each parameter of a procedure) but using a compactifying WriteInt
procedure. The measurements were made on a Ceres-2 computer [He88] with
a NS32532 CPU running at 25 Mhz clock speed.
In the left table the lengths of the symbol files in bytes are shown
(OPT lengths are 100%). On average, the CCT symbol files are about 25%
shorter than the corresponding OPT files. When the same symbol files are
written using a non-compactifying WriteInt procedure in CCT (for
addresses only), the files are about 8 % shorter (measurements not shown
here). The right table shows the reading times in ms for each symbol
file (OPT times are 100%). For this, each file was imported 100 times
and only the pure file reading time without directory operations was
measured and the average taken. The actual time spent in the import
procedures is much higher because of directory accesses and may vary
even for the same file.
Tab. 2
Further analysis of the measurements shows what was presumed, namely a
practically linear dependency between the file lengths and their reading
times (Tab. 2); the linear regression coefficient r is nearly 1.0 for
both methods. Although the average reading time per byte in CCT is
larger than in OPT, the shorter file lengths made up for this
difference. For comparison, the values for pure file reading are also
shown (the reading time per byte is the average measured time for this
file mix and is only valid for short files in general). In every case
the file length completely dominates other factors, so to improve
importing speed shorter symbol files have to be achieved. But
nevertheless, such an effort is only justified if the file system offers
some kind of caching strategy for files already open. Otherwise the
reading time for such short files is negligible compared to the
directory access time. The import and export procedures in CCT are about
20 % shorter in source code size than the OPT procedures (Tab. 2).
3. Summary
As we have seen, the linearization algorithm for general graphs is very
simple and easily adjusted to similar problems. In the example of symbol
files it leads to a clean and understandable solution. The simplicity of
the algorithm and the fact that only local invariants have to be
considered, allows symbol files to be easily extended. The main
difference between the algorithm described here and the one described in
[Gu85] is that here a pre-order traversal instead of a post-order
traversal is used. Using a post-order traversal, several postulates have
to be guaranteed all the time, which complicate the algorithm.
Nevertheless, cyclic references within types are a problem which has to
be handled in a special way. Although the post-order algorithm requires
no recursion for reading, the presented algorithm is faster in the
average because symbol files are shorter and time used for reading in
todays computers is determined mostly by the length of the files which
are to be processed.
Acknowledgements
I would like to thank Josef Templ, Cuno Pfister, Clemens Szyperski and
H. Mössenböck. Josef answered tricky questions about symbol files; he
and Cuno worked as proof readers. H. Mössenböck added valuable comments
to the modified algorithm in Appendix A. Not to forget, the entire paper
was written using the excellent Write text editor of Clemens.
References
[Cre90] R. Crelier. OP2: A Portable Oberon Compiler . Computersysteme
ETH Zürich, Technical Report No. 125, February 1990.
[Gri90] R. Griesemer. Seneca - A Language for Numerical Computations on
Vectorcomputers . Conpar 90 Proceedings, Volume on special technical
contributions, Zürich, September 1990.
[Gu85] J. Gutknecht. Compilation of Data Structures: A New Approach to
Efficient Modula-2 Symbol Files . Computersysteme ETH Zürich, Technical
Report No. 64, July 1985.
[PfiHeTe91] C. Pfister, B. Heeb, J. Templ. Oberon Technical Notes .
Companion paper.
[ReMö86] P. Rechenberg, H. Mössenböck. An Algorithm for the Linear
Storage of Dynamic Data Structures . Internal Paper, University of Linz,
Austria, 1986.
[Te90] J. Templ. SPARC-Oberon. User's Guide and Implementation.
Computersysteme ETH Zürich, Technical Report No. 133, June 1990.
[Wi85] N. Wirth. A Fast and Compact Compiler for Modula-2 .
Computersysteme ETH Zürich, Technical Report No. 64, July 1985.
[Wi88] N. Wirth. The Programming Language Oberon . Software - Practice
and Experience, 18, 7, July 1988 and Computersysteme ETH Zürich,
Technical Report No. 143, November 1990.
Appendix A: A modified Linearization Algorithm
When working with rather large graphs consisting of several thousands or
even millions of nodes, it might be impractical to use a translation
table NodeTab of about the same size as the graph for reading. If only
multiple referenced nodes had to be stored in the NodeTab array, this
translation table could be much smaller. This can be achieved by a split
of the write phase into two subphases. As a desirable side effect, after
writing, all preconditions for writing are established again, i.e. no
unmarking is necessary. The modifications are described shortly:
Writing :
In the first pass all nodes are traversed and the mark field is used as
reference counter (procedure Mark). For the mark field of every
(reachable) node the following holds: In the second pass, every node
that is referenced more than once (i.e. mark > 1) is written using a
special tag and its negative node number is stored in its mark field
(which now again fulfills the preconditions for writing).
Reading:
A negative node tag designates an already read node. A positive node tag
specifies a node which occurs only once (tag is even) or a node which
will be referenced again (tag is odd) and therefore must be stored in
the NodeTab array.
TYPE
Node = POINTER TO NodeDesc;
NodeDesc = RECORD
mark: INTEGER; (* used for linearization; initially < 0 *)
tag: INTEGER; (* determines node type; tag > 0 *)
data: ARRAY M(tag) OF INTEGER; (* N depends on the node type (tag) *)
link: ARRAY N(tag) OF Node (* M depends on the node type (tag) *)
END;
VAR
NofNodes: INTEGER; (* node number; initially = 1 *)
NodeTab: ARRAY MaxRefs OF Node;
(* node table; contains each node referenced more than once *)
PROCEDURE Mark(q: Node); (* Pass 1 *)
(* precondition: q: q.mark < 0 *)
VAR i: INTEGER;
BEGIN
IF q # NIL THEN
IF q.mark < 0 THEN (* first occurence *) q.mark := 1;
i := 0; WHILE i < Ntag DO Mark(q.link[i]); INC(i) END
ELSE INC(q.mark)
END
END
END Mark;
(* postcondition: q: q is marked *)
PROCEDURE WriteNode(q: Node); (* Pass 2 *)
(* precondition: (q: q is marked) (NofNodes = 1) *)
VAR i: INTEGER;
BEGIN
IF q = NIL THEN WriteInt(0)
ELSIF q.mark < 0 THEN (* already marked *) WriteInt(q.mark)
ELSE (* first occurence *)
IF q.mark = 1 THEN (* node occurs only once *)
WriteInt(q.tag*2); q.mark := -1
ELSE (* node is referenced several times *)
WriteInt(q.tag*2 + 1); q.mark := -NofNode; INC(NofNodes)
END;
i := 0; WHILE i < Mtag DO WriteInt(q.data[i]); INC(i) END;
i := 0; WHILE i < Ntag DO WriteNode(q.link[i]); INC(i) END
END
END WriteNode;
(* postcondition: q: q.mark < 0 *)
PROCEDURE ReadNode(VAR q: Node);
(* precondition: (NodeTab[0] = NIL) (NofNodes = 1) *)
VAR tag, i: INTEGER;
BEGIN
ReadInt(tag);
IF tag <= 0 THEN (* node reference *) q := NodeTab[-tag]
ELSE (* first occurence *)
NEW(q, tag DIV 2);
IF ODD(tag) THEN (* node is referenced several times *)
NodeTab[NofNodes] := q; INC(NofNodes)
END;
i := 0; WHILE i < Mtag DO ReadInt(q.data[i]); INC(i) END;
i := 0; WHILE i < Ntag DO ReadNode(q.link[i]); INC(i) END
END
END ReadNode;
Appendix B: Import / Export in Detail
In the following an extract of a table handler with the complete import
/ export procedures is shown. A few points are specific to this
implementation and hence commented accordingly.
CONST
(* implementation restrictions *)
MaxNofGMods = 32; MaxNofLMods = 24; MaxNofStructs = 256;
(* object modes *)
Undef* = 0; Scope = 1; Const* = 2; Type* = 3; Var* = 4; VarPar* = 5;
XVar* = 6; IVar* = 7; Field* = 8; LProc* = 9; XProc* = 10;
IProc* = 11; SProc* = 12; Mod* = 13;
(* type forms *)
Char* = 2; Bool* = 3; SInt* = 4; Int* = 5; LInt* = 6; Set* = 7;
Real* = 8; LReal* = 9; Cmplx* = 10; LCmplx* = 11; NilTyp* = 13;
NoTyp* = 14; Proc* = 15; String* = 16; Array* = 17; DynArr* = 18;
Record* = 19; Pointer* = 20;
TYPE
Name* = ARRAY 32 OF CHAR;
Object* = POINTER TO ObjectDesc;
Struct* = POINTER TO StructDesc;
ObjectDesc* = RECORD
link*, next*: Object; (* objects are chained using next *)
name* : Name;
typ* : Struct;
marked* : BOOLEAN; (* marked objects are exported *)
leave* : BOOLEAN;
mode* : INTEGER; (* identifies object kind *)
mnolev* : INTEGER; (* module numbers are <= 0 *)
a0*, a1* : LONGINT; (* object specific data *)
b0*, b1* : LONGINT (* object specific data *)
END;
StructDesc* = RECORD
ref : INTEGER; (* used as mark field *)
form* : INTEGER; (* identifies structure kind *)
obj* : Object; (* points to type object if it exists *)
len*, size*: LONGINT;
base* : Struct; (* result-, element-, base- or pointee type *)
link* : Object (* record scope *)
END;
VAR
system, universe: Object; (* predefined scopes *)
topScope*, undefObj*: Object; (* current topScope, error object *)
firstStructRef: INTEGER;
nofGMods*: INTEGER;
GMod*: ARRAY MaxNofGMods OF Object; (* global module table *)
(* predefined types *)
undefTyp*, noTyp*, stringTyp*, boolTyp*, charTyp*, sIntTyp*, intTyp*,
lIntTyp*, setTyp*, realTyp*, lRealTyp*, cmplxTyp*, lCmplxTyp*: Struct;
PROCEDURE err(no: INTEGER);
(* Displays an error message *)
(* general table handling *)
PROCEDURE NewObject*(VAR obj: Object; VAR name: ARRAY OF CHAR; mode: INTEGER);
(* Creates a new object and initializes its fields *)
PROCEDURE NewStruct*(VAR str: Struct; form: INTEGER);
(* Creates a new structure and initializes its fields *)
PROCEDURE OpenScope*(VAR scope: Object; mnolev: INTEGER);
(* Opens a new scope if scope is NIL; otherwise the old scope is
reopened *)
PROCEDURE CloseScope*;
(* Closes topScope *)
PROCEDURE Insert*(VAR obj: Object; name: ARRAY OF CHAR; mode: INTEGER);
VAR p, q: Object;
BEGIN
p := topScope; q := topScope.next;
WHILE q # NIL DO
IF q.name = name THEN err(1) END;
p := q; q := q.next
END;
NewObject(obj, name, mode); obj.mnolev := topScope.mnolev; p.next := obj
END Insert;
(* import table handling *)
PROCEDURE InsertMod*(VAR mod: Object; VAR name: ARRAY OF CHAR; key: LONGINT);
VAR i: INTEGER;
BEGIN
i := 0;
WHILE (i < nofGMods) & (name # GMod[i].name) DO INC(i) END;
IF i < nofGMods THEN (* module already imported *)
mod := GMod[i];
IF mod.b0 # key THEN err(150) (* key inconsistency *) END
ELSE
NewObject(mod, name, Mod); (* must not be visible in global scope *)
mod.b0 := key; mod.mnolev := -nofGMods;
OpenScope(mod.link, mod.mnolev); CloseScope; (* allocate own scope *)
IF nofGMods < MaxNofGMods THEN GMod[nofGMods] := mod; INC(nofGMods)
ELSE err(227) (* to many imported modules *)
END
END
END InsertMod;
PROCEDURE InsertImport(VAR obj: Object; scope: Object);
VAR p, q: Object;
BEGIN
p := scope; q := scope.next;
WHILE q # NIL DO
IF q.name = obj.name THEN obj := q; RETURN END;
p := q; q := q.next
END;
obj.mnolev := scope.mnolev; p.next := obj
END InsertImport;
(* import *)
PROCEDURE OpenRider(VAR R: Files.Rider; name: ARRAY OF CHAR; VAR res: INTEGER);
(* Sets a rider to the beginning of the symbol file *)
PROCEDURE Import*(VAR substName, impName, selfName: ARRAY OF CHAR);
VAR
R: Files.Rider;
mod0, mod: Object;
res, nofLMods, nofStructs: INTEGER;
LMod: ARRAY MaxNofLMods OF Object;
LStruct: ARRAY MaxNofStructs OF Struct;
PROCEDURE^ ReadStruct(VAR typ: Struct);
PROCEDURE ReadInt(VAR i: LONGINT);
(* Reads integers written in a compacted form [Te90] *)
VAR n: LONGINT; s: INTEGER; x: CHAR;
BEGIN
s := 0; n := 0; Files.Read(R, x);
WHILE ORD(x) >= 128 DO
INC(n, ASH(ORD(x) - 128, s)); INC(s, 7); Files.Read(R, x)
END;
i := n + ASH(ORD(x) MOD 64 - ORD(x) DIV 64 * 64, s)
END ReadInt;
PROCEDURE ReadName(VAR name: ARRAY OF CHAR);
(* Reads a name terminated with 0X *)
PROCEDURE ReadString(VAR pos, len: LONGINT);
(* Reads a string terminated with 0X into the scanner string buffer *)
PROCEDURE ReadKey(VAR x: LONGINT);
(* Reads an integer in uncompacted form *)
PROCEDURE ReadMod(VAR mod: Object);
VAR ref, key: LONGINT; name: Name;
BEGIN
ReadInt(ref);
IF ref > 0 THEN (* first occurence *)
ReadKey(key); ReadName(name);
IF name = selfName THEN err(49) END;
InsertMod(mod, name, key);
IF nofLMods < MaxNofLMods THEN LMod[nofLMods] := mod; INC(nofLMods)
ELSE err(227) (* to many imported modules *)
END
ELSE mod := LMod[-ref]
END
END ReadMod;
PROCEDURE ReadFields(VAR field: Object);
VAR obj, last: Object; name: Name;
BEGIN
field := NIL;
LOOP
ReadName(name);
IF name = "" THEN EXIT END;
NewObject(obj, name, Field); ReadStruct(obj.typ); ReadInt(obj.a0);
IF field = NIL THEN field := obj ELSE last.next := obj END;
last := obj
END
END ReadFields;
PROCEDURE ReadFP(VAR par: Object; VAR nofArgs: LONGINT);
VAR obj, last: Object; typ: Struct; n, a0, a1: LONGINT; name: Name;
BEGIN
par := NIL; ReadInt(nofArgs); n := nofArgs;
WHILE n > 0 DO
ReadName(name); ReadStruct(typ); ReadInt(a0); ReadInt(a1);
IF ODD(a1) THEN NewObject(obj, name, VarPar)
ELSE NewObject(obj, name, Var)
END;
obj.typ := typ; obj.a0 := a0; obj.a1 := a1 DIV 2;
IF par = NIL THEN par := obj ELSE last.next := obj END;
last := obj; DEC(n)
END
END ReadFP;
PROCEDURE ReadStruct(VAR typ: Struct);
VAR form: LONGINT; htyp: Struct; name: Name; obj, mod: Object;
BEGIN
ReadInt(form);
IF form <= 0 THEN (* struct reference or NIL *) typ := LStruct[-form]
ELSE (* first occurence *)
NewStruct(htyp, SHORT(form)); ReadName(name);
IF name # "" THEN (* named type *)
NewObject(obj, name, Type); obj.marked := TRUE; obj.typ := htyp;
htyp.obj := obj; ReadMod(mod); InsertImport(obj, mod.link);
typ := obj.typ
ELSE typ := htyp
END;
IF nofStructs < MaxNofStructs THEN
LStruct[nofStructs] := typ; INC(nofStructs)
ELSE err(228) (* to many imported types *)
END;
ReadStruct(htyp.base);
CASE form OF
| Proc:
OpenScope(htyp.link, 0); ReadFP(htyp.link.next, htyp.len);
CloseScope; htyp.size := 1
| Array: ReadInt(htyp.len); ReadInt(htyp.size)
| DynArr: ReadInt(htyp.size)
| Record:
OpenScope(htyp.link, 0); ReadFields(htyp.link.next); CloseScope;
ReadInt(htyp.size);
IF htyp.base # NIL THEN
htyp.len := htyp.base.len+1; htyp.link.link := htyp.base.link
ELSE htyp.len := 0
END
| Pointer: typ.size := 1
END
END
END ReadStruct;
PROCEDURE ReadObjects(scope: Object);
VAR mode: LONGINT; obj: Object; typ: Struct; name: Name;
BEGIN
LOOP (* read all objects *)
ReadInt(mode);
IF mode = Undef THEN EXIT
ELSIF mode = Type THEN
ReadName(name);
IF name # "" THEN (* alias type *)
NewObject(obj, name, Type); ReadStruct(obj.typ);
InsertImport(obj, scope)
ELSE (* other types *) ReadStruct(typ)
END
ELSE
ReadName(name); NewObject(obj, name, SHORT(mode));
ReadStruct(obj.typ);
IF mode = Const THEN
IF obj.typ.form = String THEN ReadString(obj.b0, obj.b1)
ELSIF obj.typ.form = LReal THEN ReadInt(obj.b0); ReadInt(obj.b1)
ELSE ReadInt(obj.b0)
END
ELSIF obj.mode = Var THEN ReadInt(obj.a0); obj.mode := XVar
ELSIF obj.mode = LProc THEN
OpenScope(obj.link, 0); ReadFP(obj.link.next, obj.b0); CloseScope;
obj.mode := XProc
ELSIF obj.mode = IProc THEN
OpenScope(obj.link, 0); ReadFP(obj.link.next, obj.b0); CloseScope
END;
InsertImport(obj, scope)
END
END
END ReadObjects;
PROCEDURE EnterStruct(typ: Struct);
(* Enters a predefined type into the LStruct table *)
BEGIN
IF impName = "SYSTEM" THEN Insert(mod, impName, Mod); mod.link := system
ELSE OpenRider(R, impName, res);
IF res < 0 THEN (*
no error occured
*)
LStruct[0] := NIL; EnterStruct(stringTyp);
EnterStruct(undefTyp); EnterStruct(noTyp); EnterStruct(boolTyp); EnterStruct(charTyp);
EnterStruct(sIntTyp); EnterStruct(intTyp); EnterStruct(lIntTyp); EnterStruct(setTyp);
EnterStruct(realTyp); EnterStruct(lRealTyp); EnterStruct(cmplxTyp); EnterStruct(lCmplxTyp);
nofLMods := 0; nofStructs := firstStructRef;
ReadMod(mod0); ReadObjects(mod0.link);
Insert(mod, substName, Mod);
mod.b0 := mod0.b0; mod.mnolev := mod0.mnolev; mod.link := mod0.link
ELSE err(res)
END
END
END Import;
(* export *)
PROCEDURE Export*
(mod: Object; VAR buf: ARRAY OF CHAR; VAR len: LONGINT; VAR new: BOOLEAN);
VAR pos: LONGINT; nofLMods, nofStructs: INTEGER;
PROCEDURE^ WriteStruct(typ: Struct);
PROCEDURE Write(ch: CHAR);
BEGIN buf[pos] := ch; INC(pos)
END Write;
PROCEDURE WriteInt(i: LONGINT);
(* Writes integers in a compacted form [Te90] *)
BEGIN
WHILE (i < -64) OR (i > 63) DO
Write(CHR(i MOD 128 + 128)); i := i DIV 128
END;
Write(CHR(i MOD 128))
END WriteInt;
PROCEDURE WriteName(VAR name: ARRAY OF CHAR);
(* Writes a name terminated with 0X *)
PROCEDURE WriteString(pos: LONGINT);
(* Writes the string terminated with 0X at position pos of the scanner
string buffer *)
PROCEDURE WriteKey(x: LONGINT);
(* Writes an integer in uncompacted form *)
PROCEDURE WriteMod(mod: Object);
BEGIN
IF mod.b1 < 0 THEN (* first occurence *)
mod.b1 := nofLMods; INC(nofLMods);
WriteInt(Mod); WriteKey(mod.b0); WriteName(mod.name)
ELSE WriteInt(-mod.b1)
END
END WriteMod;
PROCEDURE WriteFields(field: Object);
BEGIN
WHILE field # NIL DO
IF field.marked THEN
WriteName(field.name); WriteStruct(field.typ); WriteInt(field.a0)
END;
field := field.next
END;
Write(0X)
END WriteFields;
PROCEDURE WriteFP(par: Object; nofArgs: LONGINT);
BEGIN
WriteInt(nofArgs);
WHILE nofArgs > 0 DO
WriteName(par.name); WriteStruct(par.typ); WriteInt(par.a0);
IF par.mode = VarPar THEN WriteInt(par.a1*2 + 1)
ELSE WriteInt(par.a1*2) END;
par := par.next; DEC(nofArgs)
END
END WriteFP;
PROCEDURE WriteStruct(typ: Struct);
VAR name: Name;
BEGIN
IF typ = NIL THEN WriteInt(0)
ELSIF typ.ref > 0 THEN (* already marked *) WriteInt(-typ.ref)
ELSE (* first occurence *)
WriteInt(typ.form); typ.ref := nofStructs; INC(nofStructs);
IF typ.obj # NIL THEN (* named type *)
name := typ.obj.name;
IF ~typ.obj.marked THEN (* invisible type *)
name[0] := CHR(ORD(name[0]) - ORD("@"))
END;
WriteName(name); WriteMod(GMod[-typ.obj.mnolev])
ELSE Write(0X)
END;
WriteStruct(typ.base);
CASE typ.form OF
| Proc: WriteFP(typ.link.next, typ.len)
| Array: WriteInt(typ.len); WriteInt(typ.size)
| DynArr: WriteInt(typ.size)
| Record: WriteFields(typ.link.next); WriteInt(typ.size)
| Pointer:
END
END
END WriteStruct;
PROCEDURE WriteObjects(obj: Object);
BEGIN
WHILE obj # NIL DO
IF obj.marked THEN
WriteInt(obj.mode);
IF (obj.mode # Type) OR (obj.typ.obj # obj) THEN
(* no-type or alias type *) WriteName(obj.name)
ELSE (* other type *) Write(0X)
END;
WriteStruct(obj.typ);
IF obj.mode = Const THEN
IF obj.typ.form = String THEN WriteString(obj.b0)
ELSIF obj.typ.form = LReal THEN WriteInt(obj.b0); WriteInt(obj.b1)
ELSE WriteInt(obj.b0)
END
ELSIF obj.mode = Var THEN WriteInt(obj.a0)
ELSIF obj.mode IN {LProc, IProc} THEN WriteFP(obj.link.next, obj.b0)
END
END;
obj := obj.next
END;
WriteInt(Undef) (* termination tag *)
END WriteObjects;
PROCEDURE Compare(VAR buf: ARRAY OF CHAR; len: LONGINT; VAR new: BOOLEAN);
VAR res: INTEGER; i: LONGINT; R: Files.Rider; ch: CHAR;
prefix: ARRAY 5 OF CHAR;
BEGIN
OpenRider(R, mod.name, res); new := TRUE;
IF res < 0 THEN
Files.ReadBytes(R, prefix, LEN(prefix)); Files.Read(R, ch);
i := LEN(prefix);
WHILE (i < len) & (buf[i] = ch) DO Files.Read(R, ch); INC(i) END;
IF i = len THEN (* same symbol table => use old key *)
i := 0; WHILE i < LEN(prefix) DO buf[i] := prefix[i]; INC(i) END;
new := FALSE
END
END
END Compare;
BEGIN
pos := 0; nofLMods := 0; nofStructs := firstStructRef;
WriteMod(mod); WriteObjects(mod.link.next);
Compare(buf, pos, new)
END Export;